Chapter 4 Exploring Spectroscopy Data

You’re bound to encounter some form of spectroscopy data during your chemistry career. Most of the instruments you’ll use to acquire spectroscopy data will also have software that allows you to explore and analyze your recorded spectra. However, this software is often proprietary, and as a student it’s extremely difficult to get your hands on a working copy, let alone a working copy for each instrument you’ll use. Fortunately, you can easily export your spectra as a .csv file containing all the data you’ll need to reproduce your spectra at home.

This chapter outlines how you can use R to create easily create interactive plots, and how you can use these to plot your spectra data in an interactive format.

4.1 Setting up data for efficient plotting

Alright, let’s import an example ATR FT-IR dataset. Different programs will typically have their own export layout, but you can tidy this up in Excel during the lab. The example here is from an experiment in CHM 317 where students use ATR FT-IR to investigate the polymer compositions of consumer products against known plastics.

library(tidyverse)

spectrum <- read_csv("./data/CHM317/ATR_plastics.csv") 

head(spectrum, n = 10)
## # A tibble: 10 x 11
##    wavenumber  EPDM Neoprene  Mylar   PTFE    PVC Polystyrene Polyethylene
##         <dbl> <dbl>    <dbl>  <dbl>  <dbl>  <dbl>       <dbl>        <dbl>
##  1       550. 0.212    0.296 0.0709 0.0417 0.0174      0.0746     0.000873
##  2       551. 0.212    0.295 0.0709 0.0421 0.0174      0.0746     0.000834
##  3       551. 0.213    0.295 0.0708 0.0424 0.0175      0.0745     0.000819
##  4       552. 0.213    0.294 0.0707 0.0429 0.0175      0.0745     0.000825
##  5       552. 0.214    0.294 0.0707 0.0436 0.0176      0.0745     0.000868
##  6       553. 0.214    0.294 0.0706 0.0443 0.0177      0.0746     0.000949
##  7       553. 0.215    0.293 0.0706 0.0453 0.0177      0.0746     0.00101 
##  8       553. 0.215    0.292 0.0705 0.0455 0.0178      0.0746     0.00103 
##  9       554. 0.216    0.292 0.0704 0.0453 0.0179      0.0746     0.00105 
## 10       554. 0.216    0.291 0.0703 0.0443 0.0179      0.0745     0.00107 
## # ... with 3 more variables: `Sample: eyeglass bag` <dbl>, `Sample: Gloves
## #   (KC500)` <dbl>, `Sample: Shopping bag` <dbl>

Notice how the data is organized here. There’s a column for wavelength, and then a column for the absorance readings for each plastic sample. Note that since the experiment uses the same method for each sample, the wavenumber steps are identical between runs, hence the lone wavenumber column. While this setup, in the ‘wide’ format, is handy when recording data and organizing your spreadsheet, it’s not very efficient in R. So we’re going to transform it into a “long” format.

spectrum <- spectrum %>%
  pivot_longer(cols = !'wavenumber',
               names_to = "sample",
               values_to = "absorbance")
head(spectrum, n = 10)
## # A tibble: 10 x 3
##    wavenumber sample                 absorbance
##         <dbl> <chr>                       <dbl>
##  1       550. EPDM                     0.212   
##  2       550. Neoprene                 0.296   
##  3       550. Mylar                    0.0709  
##  4       550. PTFE                     0.0417  
##  5       550. PVC                      0.0174  
##  6       550. Polystyrene              0.0746  
##  7       550. Polyethylene             0.000873
##  8       550. Sample: eyeglass bag     0.0201  
##  9       550. Sample: Gloves (KC500)   0.0451  
## 10       550. Sample: Shopping bag     0.0238

In a long format, each column is a variable, and each row is an observation. So in this format you can read across and note that at 550.0952 cm^-1, EPDM had an absorbance of 0.212. This format makes working with the tidyverse family of functions much easier and intuitive once you understand the layout.

4.2 Plotting data

Because our data is tidied (i.e. setup properly) we can easily plot everything all at once:

fig <- ggplot(spectrum, aes(x = wavenumber,
                     y = absorbance, 
                     colour = sample)) +
       geom_point()
fig

So here we see that each sample/plastic has it’s own spectra coloured, and we can easily compare them all to each other. However this plot is a bit ugly. Ugly plots have their place, most notably when you’re just exploring your data and seeing what sticks to the wall. Now to present your plot in a lab report you’ll need to clean it up a bit. Let’s give that a go:

fig <- ggplot(spectrum, aes(x = wavenumber,
                     y = absorbance, 
                     colour = sample)) +
       geom_path() +
  labs(title = "ATR FT-IR spectra of various plastics", 
       subtitle = "Spectra recorded with a Thermo Scientific iS50",
       caption = "(data from CHM 317 classe of 2019.)") +
  xlab(" Wavenumber(cm^-1)") +
  ylab("Absorbance") +
  theme_classic()

fig

Note the use of ‘geom_path()’ in our new plot. This will connect the individual points in a plot creating a smooth line.

4.3 Creating interactive plots

Alright, so far what we showed isn’t any different then what you could do in Excel or similar programs. An advantage of R is you can use packages such as ‘plotly’ to easily create interactive graphs. Using interactive graphs when analyzing spectroscopy data is very powerful as it affords you the tools to easily zoom in and investigate small peaks, craw along the spectra and see the evolution of your samples absorbances, and to readily compare samples to each other.

All that being said, let’s load the ‘plotly’ package and transform our above plot into an interactive plot.

library(plotly)

plotlyFig <- ggplotly(fig)
plotlyFig

4.3.1 Some notes about working with Plotly

You don’t have to worry too much about what’s going on under the hood with Plotly, but you should be aware of the following:

  • Interactive plotly plots can only work in an ‘.html’ format. Obviously if you print them out as a PDF you’ll loose the interactive element.
  • If you notice something neat when you zoom in, you can use the “snapshot” button to take a picture for your report.